Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 883 Bytes

File metadata and controls

12 lines (10 loc) · 883 Bytes

MultiOmics-Cancer-Classification

The pipeline for dimensionality reduction and resampling of TCGA data, for the research on "Evaluation of machine learning-based cancer classification techniques using multi-omics data."

Description

The provided R Markdown files and Python scripts offer a comprehensive pipeline for transforming and preparing multi-omics data sets, including genomic (copy number), transcriptomic (gene expression), and epigenomic (DNA methylation) data.

  • R Markdown files of applying PCA to cleaned TCGA data
  • R Markdown files of applying Tomek Links and Near Miss to address class imbalance
  • Alternative Python script for Tomek Links

Dataset

Cleaned TCGA data is be available here Or Hugging Face Dataset