doi: https://doi.org/10.1101/2020.11.24.395913
This repository contains code and data used for the analysis performed in the paper.
The analysis can be performed using a combination of Jupyter notebooks (using Python and R). There are two ways you can go about it:
1. Docker Image: Includes all the Notebooks, Pre-required data* and the libraries used.
2. Alternatively you can download all the notbooks from this repositry along with the data.tar.gz, and run them with required packages.
All notebooks are documented, so you can check which kind of data is needed to run them (and in which folder should it be) and what is the output that is generated.
This docker image contains all the required R and Python libraries, along with the code required for generating all result files and figures.
## to pull the docker image
docker pull bhavya2017/p53_mutant_atac:latest
## to run the docker image
docker run --rm -p {port}:8888 -e JUPYTER_ENABLE_LAB=yes -v {path_to_your_working_directory}:/home/jovyan/work bhavya2017/p53_mutant_atac:latest
Make the following directories in your current path, if you are not using docker to run the notebooks.
### Extract the data folder
lrzuntar data.tar.lrz
mkdir Results
mkdir plots
mkdir profile
mkdir Results/p53retriever
mkdir profile/TF
mkdir profile/histone
./data contains all the input data for analysis of limma, jaspar, histone, homer denovo, nonBform elements, expression(RSEM), gene copy number, cancer driver genes
./plot all the figures created using the above notebooks
./profile Files to create profile plots using Deepmap tools (plot heatmap & plot profile)
./Results files from limma, HOMER input, upset, tcgabiolinks
Run the notebooks in the given order as they are interdependant
1_Pre-processing: Python notebook, to create input files for limma Differential Acc analysis. Raw data required can either be downloaded from gdc or taken from ./data ({BRCA/COAD}_rawcounts.tsv)
2_Differential_ACC: R notebook, to perform Differential Acc: requires condition and raw counts file
3_Differential_exp: R notebook, for samples from the above analysis which have expression data are given for differential exp analysis through tcgabiolinks
4_Segment_copy_number: R notebook, to fetch the segment copy number score for samples from Acc. analysis.
5_p53RE: R notebook, to find the response element of significant regions found in 2_Differential_ACC. Input requires fasta seq of the region
6_BRCA_analysis: Python notebook, to explore and annotate significant region found in breast (Main figures)
7_COAD_analysis: Python notebook, to explore and annotate significant region found in colon (Main figures)
8_Upset_plot: R notebook, to plot Figure1 E&F for breast and colon
Enhancer
To obtain results for Enhancer interactions in the siginificant peaks, Genehancer dataset is required. The dataset can be obtained through the following
1. Go to the the UCSC Table browser: https://genome.ucsc.edu/cgi-bin/hgTables.
2. Upload the significant peaks in bed format.
3.Select the following:track: Genehancer Regulatory Interaction cluster view (double elite); assembly:hg38; output:BED.
4. Download and copy the output in ./data directory.
cp {file_name} ./data/gene_enhancer_2017.bed_hg38.bed
Mutant p53 ChIP-seq overlap
To check for Mutant p53 ChIP-seq sites overlap with the significant regions extract and copy Oth.ALL.05.TP53.AllCell_38.bed.lrz: to ./data directory.
lrunzip Oth.ALL.05.TP53.AllCell_38.bed.lrz
cp Oth.ALL.05.TP53.AllCell_38.bed.lrz ./data/Oth.ALL.05.TP53.AllCell_38.bed
Python 3.8.6
jupyter-client 6.1.7
jupyterlab 2.2.8
matplotlib 3.3.2
numpy 1.19.2
scikit-image 0.17.2
scikit-learn 0.23.2
PyYAML 5.3.1
scipy 1.5.2
seaborn 0.11.0
pandas 1.1.3
statsmodels 0.12.0
pybedtools 0.8.1
R version 3.6.3
BiocManager 1.30.10
edgeR 3.28.1
TCGAbiolinks 2.14.1
p53retriever 1.2.0
UpSetR 1.4.0
dbplyr 1.4.4
SummarizedExperiment 1.16.1