Snakemake based pipeline for ChIP-seq and ATAC-seq datasets processing from raw data QC and alignment to visualization and peak calling.
During peak calling steps chipseq-smk-pipeline
automatically matches signal with control file by names proximity.
Input FASTQ files
Pipeline aligned FASTQ or gzipped FASTQ reads, defined in config.yaml
.
Reads folder is a relative path in pipeline working directory and defined by fastq_dir
property.
FASTQ reads extension is defined by fastq_ext
property, e.g. could be fq
, fq.gz
, fastq
, fastq.gz
.
Input BAM files
Use start_with_bams=True
config option to start with existing bam files.
Pipeline starts with BAM
files in work_dir/bams
folder.
Path | Description |
---|---|
config.yaml |
Default pipeline options |
trimmed |
Trimmed FASTQ file, if trim_reads option is True. |
bams |
BAMs with aligned reads, MAPQ >= 30 |
bw |
BAM coverage visualization using DeepTools |
macs2 |
MACS2 peaks |
sicer |
SICER peaks |
span |
SPAN peaks |
qc |
QC Reports |
multiqc |
MultiQC reports for different steps |
logs |
Shell commands logs |
The pipeline requires conda
.
- If
conda
is not installed, follow the instructions at Conda website. - Navigate to repository directory.
Create a Conda environment for snakemake
:
$ conda env create --file environment.yaml --name snakemake
Activate the newly created environment:
$ source activate snakemake
On Ubuntu please ensure that gawk
is installed:
$ sudo apt-get install gawk
Run the pipeline to start with fastq reads:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all [--cores <cores>] --use-conda --directory <work_dir> \
--config fastq_dir=<fastq_dir> genome=<genome> --rerun-incomplete
The Default pipeline doesn't perform coverage visualization and launch peak callers.
Please add bw=True, macs2=True
, sicer=True
, span=True
to create coverage bw files and call peaks.
To launch MACS2 in --broad
mode, use the following config:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all [--cores <cores>] --use-conda --directory <work_dir> \
--config fastq_dir=<fastq_dir> genome=<genome> \
macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
--rerun-incomplete
See config.yaml
for a complete list of parameters. Use--config
to override default options from config.yaml
file.
Rules DAG produced with additional command line agruments --forceall --rulegraph | dot -Tpdf > rules.pdf
Configure profile for required cluster system with name cluster
.
$ mkdir -p ~/.config/snakemake
$ cd ~/.config/snakemake
$ cookiecutter https://github.com/iromeo/generic.git
Example of ATAC-Seq processing on qsub
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all --use-conda --directory <work_dir> \
--profile cluster --cluster-config cluster_config.yaml --jobs 150 \
--config fastq_dir=<fastq_dir> genome=<genome> \
bowtie2_params="-X 2000 --dovetail" \
macs2=True macs2_params="-q 0.05 -f BAMPE --nomodel --nolambda -B --call-summits" \
span=True span_fragment=0 span_bg_sensitivity=1.0 span_clip=0.4 --rerun-incomplete
P.S: Use --config
to override default options from config.yaml
file
Please download example fastq.gz
files
from CD14_chr15_fastq folder.
These files are filtered on human hg19 chr15 to reduce size and make computations faster.
Launch chipseq-smk-pipeline
:
$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
all --use-conda --cores all --directory <work_dir> \
--config fastq_ext=fastq.gz fastq_dir=<work_dir> genome=hg19 macs2=True sicer=True span=True \
--rerun-incomplete
- Learn more about Snakemake workflow management system
- Developed with SnakeCharm plugin for PyCharm IDE by JetBrains Research BioLabs