Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

This repository contains the data, scripts, and analyses used in the research titled "Unravelling the Co-Morbidity between COVID-19 and Neurodegenerative Diseases Through Multi-scale Graph Analysis: A Systematic investigation of Biological Databases and Text Mining". The project leverages Neo4j paltform for graph-based analysis and integrates natural language processing to explore relationships between COVID-19 and neurodegenerative diseases (NDDs). The constructed comorbidity database is publicly available in Aura database, with credentials provided below.

Overview

This project explores the connections between COVID-19 and neurodegenerative diseases by:

Integrating database information about COVID-19 and NDDs and storing them in a graph structure.
Extracting textual data from scientific literature and using natural language processing pipelines for information extraction and KG construction.
Loading all KG in Neo4j to identify and analyse relationships and pathways between entities such as genes, diseases, and chemicals.
Construction of a hypothesis database for omorbidity between COVID-19 and NDDs to explore, analyse, and visualise testable comorbidity hypotheses.

Data

The repository includes the following directories:

Expert-curated-publications: Contains manually curated publications relevant to the study, ensuring high-quality and accurate information.
PubTator3-results: Includes results from PubTator3, a web-based system that offers a comprehensive set of features and tools for exploring biomedical literature using advanced text mining and AI techniques.
Sherpa-results: Houses outputs from Sherpa, a tool designed to assist in the curation of biomedical literature by providing automated annotations and insights.
Textual-corpora-for-textmining: Comprises textual corpora prepared for text mining purposes, facilitating the extraction of meaningful patterns and relationships regarding COVID-19 and NDD.
comorbidity_paths.txt: Some example comorbidity pathways.
hypothesis_pmid_evidences.csv: Some of the comorbidity triples with source evidence uploaded to the databases.
comorbidity-db-neo4j.dump: The neo4j dump file of the constructed comorbidity database.

Sources

1. `comorbidity-hypothesis-db.py`

Purpose: Automatically opens the Neo4j Browser with prefilled credentials to connect to the AuraDB instance for comorbidity hypothesis exploration.
Key Features:
- Simplifies connection to Neo4j by generating a pre-configured URL.
- Useful for direct interaction with the knowledge graph.
Usage: Run the script, and the Neo4j Browser will open in your default web browser:
```
python comorbidity-hypothesis-db.py
```

2. `comorbidity_database_neo4j_upload.py`

Purpose: Uploads curated comorbidity hypothesis paths to the Neo4j AuraDB instance.
Key Features:
- Simplifies uploading comorbidity hypothesis candidates.
- Standardizes and normalizes graph entities for compatibility.

Usage: Run the script to upload the data:

python comorbidity_database_neo4j_upload.py

Getting Started

Prerequisites

Neo4j AuraDB: Ensure you have access to a Neo4j AuraDB instance. Use the provided connection details or set up your own.
Python Environment: Install the required libraries:
```
pip install neo4j pandas
```

Notebooks

1. `analyze-neo4j.ipynb`

Purpose: Analyzes the knowledge graph loaded to Neo4j to extract insights.
Key Features:
- Counts nodes and edges in the graph.
- Executes community detection algorithms like Louvain using Neo4j's Graph Data Science (GDS) library.
- Retrieves and visualizes properties of detected clusters
Usage: Open the Jupyter Notebook and follow the instructions to:
- Query the Neo4j database.
- Get general statistics about nodes, triple and pathways, and analyze them.

2. extract-pmids.ipynb

Purpose: Systematically extracts PubMed IDs (PMIDs) from PubMed database using MeSH terms for COVID-19 neurological manifestations research.
Key Features:
- Reads MeSH search terms from external text file (mesh_terms_refined.txt).
- Queries PubMed via NCBI E-utilities API with automated batch processing.
- Filters publications by date range (2021-2024) using publication date [pdat] field.
- Processes XML responses and converts to Python dictionaries using xmltodict.
- Extracts and saves PMIDs for each MeSH term to individual text files.

Usage: Open the Jupyter Notebook and follow the instructions to:

Ensure mesh_terms_refined.txt file contains your MeSH search terms (one per line).
Create results-refined/ directory for output files.
Execute cells to automatically query PubMed for each term and save PMIDs to separate files.
Monitor search progress and review extracted PMID counts for each MeSH term.

Exploring the Covid-NDD Comorbidity Database

To manually explore the comorbidity graph database:

Open the Neo4j Browser:

Navigate to https://browser.neo4j.io.
Enter the Connection Details:
- URI: neo4j+s://09f8d4e9.databases.neo4j.io
- Username: neo4j
- Password: Refer to the credentials provided in the src/comorbidity-hypothesis-db.py.
Run Cypher Queries:

Once connected, you can execute Cypher queries to explore the graph. For example, to retrieve a sample of nodes:
```
MATCH (n) RETURN n LIMIT 10;
```

📄 Archive & Citation

Zenodo accessibility:

📘 DOI: https://doi.org/10.5281/zenodo.1234567

Contact

For any questions, suggestions, or collaborations, please contact:

Negin Babaiha
Email: [email protected]
Google Scholar Profile

Feel free to reach out for discussions regarding the project!

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
cypher		cypher
data		data
docker		docker
images		images
notebooks		notebooks
pipeline		pipeline
src		src
README.md		README.md
github-tree.py		github-tree.py
main.py		main.py
pipeline.log		pipeline.log
requirements.txt		requirements.txt
test.ipynb		test.ipynb
tree_output.txt		tree_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

Table of Contents

Overview

Data

Sources

1. `comorbidity-hypothesis-db.py`

2. `comorbidity_database_neo4j_upload.py`

Getting Started

Prerequisites

Notebooks

1. `analyze-neo4j.ipynb`

2. extract-pmids.ipynb

Exploring the Covid-NDD Comorbidity Database

📄 Archive & Citation

Contact

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

SCAI-BIO/covid-NDD-comorbidity-NLP

Folders and files

Latest commit

History

Repository files navigation

Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

Table of Contents

Overview

Data

Sources

1. comorbidity-hypothesis-db.py

2. comorbidity_database_neo4j_upload.py

Getting Started

Prerequisites

Notebooks

1. analyze-neo4j.ipynb

2. extract-pmids.ipynb

Exploring the Covid-NDD Comorbidity Database

📄 Archive & Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

1. `comorbidity-hypothesis-db.py`

2. `comorbidity_database_neo4j_upload.py`

1. `analyze-neo4j.ipynb`

Packages