Skip to content

Latest commit

 

History

History
132 lines (101 loc) · 5.85 KB

README.md

File metadata and controls

132 lines (101 loc) · 5.85 KB

p53_mutant_atac

Differential chromatin accessibility landscape of gain-of-function mutant p53 tumours

doi: https://doi.org/10.1101/2020.11.24.395913

This repository contains code and data used for the analysis performed in the paper.


How to use

The analysis can be performed using a combination of Jupyter notebooks (using Python and R). There are two ways you can go about it:
1. Docker Image: Includes all the Notebooks, Pre-required data* and the libraries used.
2. Alternatively you can download all the notbooks from this repositry along with the data.tar.gz, and run them with required packages.

All notebooks are documented, so you can check which kind of data is needed to run them (and in which folder should it be) and what is the output that is generated.


1. Using Docker

This docker image contains all the required R and Python libraries, along with the code required for generating all result files and figures.

## to pull the docker image
docker pull bhavya2017/p53_mutant_atac:latest
## to run the docker image
docker run --rm -p {port}:8888 -e JUPYTER_ENABLE_LAB=yes -v {path_to_your_working_directory}:/home/jovyan/work bhavya2017/p53_mutant_atac:latest

2. Alternative Method

Make the following directories in your current path, if you are not using docker to run the notebooks.

### Extract the data folder
lrzuntar data.tar.lrz
mkdir Results
mkdir plots
mkdir profile
mkdir Results/p53retriever
mkdir profile/TF
mkdir profile/histone

./data contains all the input data for analysis of limma, jaspar, histone, homer denovo, nonBform elements, expression(RSEM), gene copy number, cancer driver genes

./plot all the figures created using the above notebooks

./profile Files to create profile plots using Deepmap tools (plot heatmap & plot profile)

./Results files from limma, HOMER input, upset, tcgabiolinks

Notebook Details

Run the notebooks in the given order as they are interdependant

1_Pre-processing: Python notebook, to create input files for limma Differential Acc analysis. Raw data required can either be downloaded from gdc or taken from ./data ({BRCA/COAD}_rawcounts.tsv)

2_Differential_ACC: R notebook, to perform Differential Acc: requires condition and raw counts file

3_Differential_exp: R notebook, for samples from the above analysis which have expression data are given for differential exp analysis through tcgabiolinks

4_Segment_copy_number: R notebook, to fetch the segment copy number score for samples from Acc. analysis.

5_p53RE: R notebook, to find the response element of significant regions found in 2_Differential_ACC. Input requires fasta seq of the region

6_BRCA_analysis: Python notebook, to explore and annotate significant region found in breast (Main figures)

7_COAD_analysis: Python notebook, to explore and annotate significant region found in colon (Main figures)

8_Upset_plot: R notebook, to plot Figure1 E&F for breast and colon


*Required data

Enhancer

To obtain results for Enhancer interactions in the siginificant peaks, Genehancer dataset is required. The dataset can be obtained through the following

1. Go to the the UCSC Table browser: https://genome.ucsc.edu/cgi-bin/hgTables.
2. Upload the significant peaks in bed format.
3.Select the following:track: Genehancer Regulatory Interaction cluster view (double elite); assembly:hg38; output:BED.
4. Download and copy the output in ./data directory.

cp {file_name} ./data/gene_enhancer_2017.bed_hg38.bed

Mutant p53 ChIP-seq overlap

To check for Mutant p53 ChIP-seq sites overlap with the significant regions extract and copy Oth.ALL.05.TP53.AllCell_38.bed.lrz: to ./data directory.

lrunzip Oth.ALL.05.TP53.AllCell_38.bed.lrz
cp Oth.ALL.05.TP53.AllCell_38.bed.lrz ./data/Oth.ALL.05.TP53.AllCell_38.bed


Required Packages

Python 3.8.6

jupyter-client                6.1.7
jupyterlab                    2.2.8
matplotlib                    3.3.2
numpy                         1.19.2
scikit-image                  0.17.2
scikit-learn                  0.23.2
PyYAML                        5.3.1
scipy                         1.5.2
seaborn                       0.11.0
pandas                        1.1.3
statsmodels                   0.12.0
pybedtools                    0.8.1

R version 3.6.3

BiocManager                   1.30.10
edgeR                         3.28.1
TCGAbiolinks                  2.14.1
p53retriever                  1.2.0
UpSetR                        1.4.0
dbplyr                        1.4.4
SummarizedExperiment          1.16.1