Pharmacogenomics pipeline implemented in hydra
To run this workflow, the following tools need to be available:
- Add all sample ids to
samples.tsv
in the columnsample
. - Add all sample data information to
units.tsv
. Each row represents afastq
file pair with corresponding forward and reverse reads. Also indicate the sample id, run id and lane number, adapter.
- You need a BAM file marked for duplicates
- Reference genome
- dbSNP database in VCF file format
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
cd .tests/integration
snakemake -s ../../Snakefile -j1 --use-singularity
# alternative command:
snakemake -s ../../workflow/Snakefile -j1 --use-singularity --configfile config/config.yaml
# generate DAG:
snakemake --cores 1 -s workflow/Snakefile --configfile config/config.yaml --rulegraph | dot -Tsvg > ./images/dag.svg
The workflow is designed for WGS data meaning huge datasets which require a lot of compute power. For HPC clusters, it is recommended to use a cluster profile and run something like:
snakemake -s /path/to/Snakefile --profile my-awesome-profile