OmixHub is a platform that interfaces with GDC using python to help users to apply ML based analysis on different sequencing data. Currently we support only for RNA-Seq based datasets from genomic data commons (GDC)
-
Cohort Creation of Bulk RNA Seq Tumor and Normal Samples from GDC.
-
Bioinformatics analysis:
- Application of PyDESeq2 and GSEA in a single pipeline.
-
Classical ML analysis:
- Applying clustering, supervised ML and outlier sum statistics.
-
Custom API Connections:
- Search and retrieval of Cancer Data cohorts from GDC using complex json filters (Methods in src.Connectors for GDC API search and retrieval using custom queries)
- Interacting with MongoDB database in a pythonic manner (DOCS coming soon).
- Interacting with Google cloud BigQuery in a pythonic manner (DOCS coming soon).
- Clone the repository
git clone https://github.com/adhal007/OmixHub.git
- Create the correct conda enviroment for OmixHub:
conda env create -f environment.yaml
- RNA Seq Cohort Creation of tumor and normal samples by primary site
- Example Jupyter Notebook
- Code:
import grequests
import src.Engines.gdc_engine as gdc_engine
from importlib import reload
reload(gdc_engine)
## Create Dataset for differential gene expression
rna_seq_DGE_data = gdc_eng_inst.run_rna_seq_data_matrix_creation(primary_site='Kidney', downstream_analysis='DE')
## Create Dataset for machine learning analysis
rna_seq_DGE_data = gdc_eng_inst.run_rna_seq_data_matrix_creation(primary_site='Kidney', downstream_analysis='ML')
- Differential gene expression(DGE) + Gene set enrichment analysis(GSEA) for tumor vs normal samples
- Using GRADIO App for DGE + GSEA:
- Currently this is restricted to Users. If you want to try this ou or contribute to this reach out to me via [email protected] with your interest.
- Running the app:
- After completing the steps in getting started, follow the next steps
- Run gradio app
python3 app_gradio.py
- Check out the app navigation documentation.