Python package wrapping ENCODE epigenomic data for several reference cell lines.
As usual, just download it using pip:
pip install epigenomic_dataset
Since some software handling coverages sometimes get slightly different results, here's three of them:
TODO: THE FOLLOWING SECTION WILL NEED RESTRUCTURING IN A LITTLE BIT!
We have already downloaded and obtained the max window value for each promoter and enhancer region for the cell lines A549, GM12878, H1, HEK293, HepG2, K562 and MCF7 in the dataset Fantom and cell lines A549, GM12878, H1, HepG2 and K562 for the Roadmap dataset taking in consideration all the target features listed in the complete table of epigenomes.
The thresholds used for classifying the activations of enhancers and promoters in Fantom are the default explained in the sister pipeline CRR labels which handles the download and preprocessing of the data from Fantom and Roadmap.
Dataset | Assembly | Window Size | Region | Cell line | Download URL |
---|---|---|---|---|---|
fantom | hg38 | 256 | promoters | GM12878 | Download |
fantom | hg38 | 256 | promoters | A549 | Download |
fantom | hg38 | 256 | promoters | HEK293 | Download |
fantom | hg38 | 256 | promoters | HepG2 | Download |
fantom | hg38 | 256 | promoters | K562 | Download |
fantom | hg38 | 256 | promoters | H1 | Download |
fantom | hg38 | 256 | promoters | MCF-7 | Download |
fantom | hg38 | 256 | enhancers | GM12878 | Download |
fantom | hg38 | 256 | enhancers | A549 | Download |
fantom | hg38 | 256 | enhancers | HEK293 | Download |
fantom | hg38 | 256 | enhancers | HepG2 | Download |
fantom | hg38 | 256 | enhancers | K562 | Download |
fantom | hg38 | 256 | enhancers | H1 | Download |
fantom | hg38 | 256 | enhancers | MCF-7 | Download |
fantom | hg38 | 128 | promoters | GM12878 | Download |
fantom | hg38 | 128 | promoters | A549 | Download |
fantom | hg38 | 128 | promoters | HEK293 | Download |
fantom | hg38 | 128 | promoters | HepG2 | Download |
fantom | hg38 | 128 | promoters | K562 | Download |
fantom | hg38 | 128 | promoters | H1 | Download |
fantom | hg38 | 128 | promoters | MCF-7 | Download |
fantom | hg38 | 128 | enhancers | GM12878 | Download |
fantom | hg38 | 128 | enhancers | A549 | Download |
fantom | hg38 | 128 | enhancers | HEK293 | Download |
fantom | hg38 | 128 | enhancers | HepG2 | Download |
fantom | hg38 | 128 | enhancers | K562 | Download |
fantom | hg38 | 128 | enhancers | H1 | Download |
fantom | hg38 | 128 | enhancers | MCF-7 | Download |
fantom | hg38 | 64 | promoters | GM12878 | Download |
fantom | hg38 | 64 | promoters | A549 | Download |
fantom | hg38 | 64 | promoters | HEK293 | Download |
fantom | hg38 | 64 | promoters | HepG2 | Download |
fantom | hg38 | 64 | promoters | K562 | Download |
fantom | hg38 | 64 | promoters | H1 | Download |
fantom | hg38 | 64 | promoters | MCF-7 | Download |
fantom | hg38 | 64 | enhancers | GM12878 | Download |
fantom | hg38 | 64 | enhancers | A549 | Download |
fantom | hg38 | 64 | enhancers | HEK293 | Download |
fantom | hg38 | 64 | enhancers | HepG2 | Download |
fantom | hg38 | 64 | enhancers | K562 | Download |
fantom | hg38 | 64 | enhancers | H1 | Download |
fantom | hg38 | 64 | enhancers | MCF-7 | Download |
fantom | hg38 | 1024 | promoters | GM12878 | Download |
fantom | hg38 | 1024 | promoters | A549 | Download |
fantom | hg38 | 1024 | promoters | HEK293 | Download |
fantom | hg38 | 1024 | promoters | HepG2 | Download |
fantom | hg38 | 1024 | promoters | K562 | Download |
fantom | hg38 | 1024 | promoters | H1 | Download |
fantom | hg38 | 1024 | promoters | MCF-7 | Download |
fantom | hg38 | 1024 | enhancers | GM12878 | Download |
fantom | hg38 | 1024 | enhancers | A549 | Download |
fantom | hg38 | 1024 | enhancers | HEK293 | Download |
fantom | hg38 | 1024 | enhancers | HepG2 | Download |
fantom | hg38 | 1024 | enhancers | K562 | Download |
fantom | hg38 | 1024 | enhancers | H1 | Download |
fantom | hg38 | 1024 | enhancers | MCF-7 | Download |
fantom | hg38 | 512 | promoters | GM12878 | Download |
fantom | hg38 | 512 | promoters | A549 | Download |
fantom | hg38 | 512 | promoters | HEK293 | Download |
fantom | hg38 | 512 | promoters | HepG2 | Download |
fantom | hg38 | 512 | promoters | K562 | Download |
fantom | hg38 | 512 | promoters | H1 | Download |
fantom | hg38 | 512 | promoters | MCF-7 | Download |
fantom | hg38 | 512 | enhancers | GM12878 | Download |
fantom | hg38 | 512 | enhancers | A549 | Download |
fantom | hg38 | 512 | enhancers | HEK293 | Download |
fantom | hg38 | 512 | enhancers | HepG2 | Download |
fantom | hg38 | 512 | enhancers | K562 | Download |
fantom | hg38 | 512 | enhancers | H1 | Download |
fantom | hg38 | 512 | enhancers | MCF-7 | Download |
Here are the labels for all the considered cell lines.
Dataset | Promoters | Enhancers | ||
---|---|---|---|---|
Fantom | 200 | 1000 | 200 | 1000 |
Roadmap | 200 | 1000 | 200 | 1000 |
TODO: align promoters and enhancers in a reference labels dataset.
The complete pipeline used to retrieve the CRR epigenomic data is available here.
You can automatically retrieve the data as follows:
from epigenomic_dataset import load_epigenomes
X, y = load_epigenomes(
cell_line = "K562",
dataset = "fantom",
region = "promoters",
window_size = 256,
root = "datasets" # Path where to download data
)
The considered raw data are from this query from the ENCODE project
You can find the complete table of the available epigenomes here. These datasets were selected to have (at time of the writing, 07/02/2020) the least possible amount of known problems, such as low read resolution.
You can run the pipeline as follows: suppose you want to extract the epigenomic features for the cell lines HepG2 and H1:
from epigenomic_dataset import build
build(
bed_path="path/to/my/bed/file.bed",
cell_lines=["HepG2", "H1"]
)
If you want to specify where to store the files use:
from epigenomic_dataset import build
build(
bed_path="path/to/my/bed/file.bed",
cell_lines=["HepG2", "H1"],
path="path/to/my/target"
)
By default, the downloaded bigWig files are not deleted. You can choose to delete the files as follows:
from epigenomic_dataset import build
build(
bed_path="path/to/my/bed/file.bed",
cell_lines=["HepG2", "H1"],
path="path/to/my/target",
clear_download=True
)