curatedTCGAData
is an experiment data package in both release and development
versions of Bioconductor. It makes use of ExperimentHub
to access
pre-processed and curated data from The Cancer Genome Atlas
(TCGA) as MultiAssayExperiment objects.
The clinical datasets taken from TCGA
include a number of variables including
demographic and pathology variables. Curation was done to merge additional
level one data and subtype information. Any empty variables were removed and
their names were saved in the colData
metadata. Ongoing efforts include merging
the different levels of variables in the colData
and thus reducing the
repetition of some clinical variables.
Among the different TCGA cohorts (n = 33) there were various molecular subtypes detailed (methylation, mRNA, etc.) in the primary publications. Currently, no publicly available datasets contain clinical subtype information. As such, we have integrated both clinical and molecular subtype information by curating the clinical variables as detailed above and incorporating subtype information from the supplements of the primary publications. All subtype curation was done by hand and where supplemental information was not available in a publication the coresponding author was emailed and asked to provide it. With the addition of the molecular subtype information it becomes possible to examine subtype characterization across cohorts and will hopeful provide deeper insight into oncogenisis.
See the NCI wiki and summary on FireHose for information on genome builds for all aligned data types.
Install curatedTCGAData
from Bioconductor using BiocManager
:
if (!require("BiocManager"))
install.packages("BiocManager")
library(BiocManager)
install(version = "devel")
install("curatedTCGAData")
browseVignettes("curatedTCGAData")
We appreciate all feedback to our experiment data package. Please file an issue on GitHub and we will get to it ASAP.