Paired-end Shallow Whole Genome Sequencing Snakemake Pipeline

This is a snakemake pipeline for the analysis of paired-end shallow whole genome sequencing data. The pipeline is designed to work with raw fastq files generated from Illumina HiSeq or NovaSeq. It includes the following steps:

Align reads to a reference genome using BWA
Index the resulting BAM file
Mark duplicates using Picard
Index the marked BAM file
Generate fastqc report for raw, aligned, and marked reads
Generate multiqc report for all three reports

Setup

To use this pipeline, first clone the repository and navigate to the project directory:

git clone https://github.com/DucoG/PEsWGS-alignment-snakemake.git
cd PEsWGS-alignment-snakemake

Config File

The config_snake.yaml contains the location of the reference genome required. Please update this file with your own values before running the pipeline.

Usage

Once the environment and config file have been set up, the pipeline can be run using snakemake. Before running the entire pipeline for all your samples, it is advisable to first check if all settings are correct using a dry run using the -n option for a dry-run and -p for printing the ran commands:

snakemake -np --use-conda

To run the entire pipeline, use the following command:

snakemake --use-conda -j <NUMBER_OF_JOBS>

This will generate marked BAM files and QC reports for the data in data/raw. Your directory will look as follows:

.
├── data                        <- contains all input and output files for the pipeline
│   ├── aligned
│   ├── marked
│   ├── raw
├── logs                        <- contains log files for each step of the pipeline
│   ├── index_bam
│   └── mark_duplicates
└── qc_outputs                  <- contains output files from quality control steps
    ├── marked
    │   ├── fastqc_output
    │   └── multiqc_output
    │       └── multiqc_data
    └── raw
        ├── fastqc_output
        └── multiqc_output
            └── multiqc_data

To run a specific rule, for example, to run only the fastq2bam rule, use the following command:

snakemake data/aligned/<SAMPLE_ID>_hg38.bam --use-conda

This will generate a BAM file for the specified sample in data/aligned.

Note

This pipeline does not trim the sequences automatically. To determine if trimming is necessary, start by generating the raw multiqc report. Based on that, trimming can be done on the raw data files before aligning.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
config_snake.yaml		config_snake.yaml
dag.svg		dag.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paired-end Shallow Whole Genome Sequencing Snakemake Pipeline

Setup

Config File

Usage

Note

About

Releases

Packages

Languages

DucoG/PEsWGS-alignment-snakemake

Folders and files

Latest commit

History

Repository files navigation

Paired-end Shallow Whole Genome Sequencing Snakemake Pipeline

Setup

Config File

Usage

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages