quicksand

See readthedocs for the full documentation of the pipeline.

Description

quicksand (quick analysis of sedimentary ancient DNA) is an open-source Nextflow pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA (mtDNA) in aDNA samples. quicksand combines fast alignment-free classification using KrakenUniq with downstream mapping (BWA), post-classification filtering, and ancient DNA authentication. quicksand is optimized for speed and portablity and requires either Singularity or Docker.

Workflow

Quickstart

Requirements

To run Nextflow, you need a POSIX-compatible system (e.g., Linux or macOS). quicksand was developed and tested on Linux (x86_64 architecture)

To run quicksand, please install

Nextflow v22.10 or larger
Singularity or Docker

Note: To run quicksand in singularity, your kernel needs to support user-namespaces (see here or here).

Prepare Input

The input for quicksand is a directory with user-supplied files in BAM or FASTQ format. Adapter-trimming, overlap-merging and sequence demultiplexing need to be performed by the user prior to running quicksand. Provide the directory with the --split flag

Caution

Each input-file should correspond to a single sequence-library. The processing of merged libraries with quicksand can lead to sequence loss because of the PCR-deduplication step with bam-rmdup

Download Test-file

As a test file, download a mammalian mtDNA capture library from Denisova Cave Layer 20 (published in Zavala et al. 2021)

wget -P split \
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR564/ERR5640810/A20896.bam

Create Reference Database

The required KrakenUniq database, the reference genomes for mapping and the bed-files for low-complexity filtering are available on the MPI EVA FTP Servers. Custom versions of the reference material can be created with the quicksand-build pipeline

Create Test Database

For the quickstart of quicksand, create a small test-database containing only the Hominidae, Bovidae and Hyaenidea mtDNA reference genomes (~150 genomes, runtime: ~3-5 minutes, size ~5GB).

(to reduce database size and runtime, use only --include Hominidae)

nextflow run mpieva/quicksand-build -r v3.1 \
  --include  Hominidae,Bovidae,Hyaenidae \
  --outdir   refseq \
  -profile   singularity

Download Full Database

To download the full reference database (~60GB), use this command:

latest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)
wget -r -np -nc -nH --cut-dirs=3 --reject="*index.html*" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest

This can take several hours! For testing quicksand its recommended to build a small database (see above)

Run quicksand

quicksand is executed directly from github. With the test-database created and the test-file downloaded, run the pipeline as follows:

# set this if you encounter a heap-space error to increase the memory that is used by nextflow
export NXF_OPTS="-Xms10g -Xmx15g" # increase or decrease the numbers as required

nextflow run mpieva/quicksand -r v2.5 \
  --db        refseq/kraken/Mito_db_kmer22/ \
  --genomes   refseq/genomes/ \
  --bedfiles  refseq/masked/ \
  --split     split/ \
  -profile    singularity #mind the single dash!

Output

Please see the documentation for a comprehensive description of the output files and structure!

The main summary table (final_report.tsv) contains one line per input file and detected family (passing the --krakenuniq_min_kmers and --krakenuniq_min_reads cutoff). The following columns are reported:

RG: Name of the file analyzed
ReadsRaw: Raw number of sequences in the file (paired reads only counted once)
ReadsFiltered Number of sequences after filtering for --bamfilterflag
ReadsLengthfiltered: Number of sequences after additonal filtering for sequence length (--bamfilter_length_cutoff)
Kmers: KrakenUniq: Number of unique kmers used for classification (format: "best" and "(family)")
KmerCoverage: KrakenUniq: Kmer coverage for that classification (format: "best" and "(family)")
KmerDupRate: KrakenUniq: Kmer duplication rate for that classification (format: "best" and "(family)")
ExtractLVL: "f" (family) or "o" (order), set by --taxlvl
ReadsExtracted: Number of sequences assigned by KrakenUniq
Order: Detected Order
Family: Detected Family
Species: The reference-genome used for the mapping of 'ReadsExtracted'
Reference: "best" or "fixed" (--fixed)
ReadsMapped: Number of sequences mapped, passing the mapping quality-cutoff(--mapbwa_quality_cutoff)
ProportionMapped: ReadsMapped / ReadsExtracted
ReadsDeduped: Number of unique sequences (removed PCR duplicates)
DuplicationRate: ReadsMapped / ReadsDeduped
CoveredBP: covbases stat of the samtools coverage command. The number of bases covered in the reference genome by mapped sequences (max: ~17000)
ReadsBedfiltered: Number of unique sequences after applying low-complexity bed-filtering
PostBedCoveredBP: Number of bases covered in the reference genome by bedfiltered sequences
FamPercentage: Percentage of unique sequences in the alignment from the total number of unique sequences in the sample (For PSF-filter)
Ancientness: "-", "+" or "++". Significance level of the C-to-T or G-to-A deamination rate in the alignment (for G-to-A, use --doublestranded)
ReadsDeam(1term): Number of unique sequences with a C-to-T or G-to-A substitution in the terminal base positions (for G-to-A, use --doublestranded)
ReadsDeam(3term): Number of unique sequences with a C-to-T or G-to-A substitution in the terminal three positions (for G-to-A, use --doublestranded)
Deam5(95ci): The terminal C-to-T substitution-rate (and the 95% confidence interval) for the 5' end
Deam3(95ci): The terminal C-to-T or G-to-A substitution-rate (and the 95% confidence interval) for the 3' end (for G-to-A, use --doublestranded)
Deam5Cond(95ci): The terminal C-to-T substitution-rate (and the 95% confidence interval) for the 5' end, conditioned on a substitution at the opposite end
Deam3Cond(95ci): The terminal C-to-T (or G-to-A) substitution-rate (and the 95% confidence interval) for the 3' end, conditioned on a substitution at the opposite end
MeanFragmentLength: Mean fragment length of all unique sequences
MeanFragmentLength(3term): Mean fragment length of all 'ancient' sequences
Coverage: meandepth of the samtools coverage command. Corresponds to the depth of coverage in the alignment.
Breadth: coverage of the samtools coverage command / 100: The proportion of bases covered by mapped sequences in the reference genome
ExpectedBreadth: Expected breadth based on 'Coverage'. Calculated using the formula: ExpectedBreadth = 1 - e^(-0.833 × Coverage)
ProportionExpectedBreadth: Breadth / ExpectedBreadth (For PEB-filter)

Common Errors

A collection of common nextflow-errors and how to solve them

Heap Space

 -- Check '.nextflow.log' file for details
ERROR ~ Java heap space

Heap space errors can occur if nextflow itself requires more memory than provided by default (e.g. when screening too many samples in parallel). You can increase the heap-space as needed (e.g., to 5gb) with

export NXF_OPTS="-Xms5g -Xmx5g"

Citation

If you use quicksand in your research, please cite the quicksand publication as follows:

Szymanski, Merlin, Johann Visagie, Frederic Romagne, Matthias Meyer, and Janet Kelso.
“quick analysis of sedimentary ancient DNA using quicksand”, Molecular Biology and Evolution, 2025.
https://doi.org/10.1093/molbev/msaf305.

Honorable Mentions

This pipeline uses code inspired by the nf-core initative, reused here under the MIT license.

Ewels, Philip A., Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, and Sven Nahnsen. 2020. “The Nf-core Framework for Community-curated Bioinformatics Pipelines”. Nature Biotechnology 38 (3): 276–78. https://doi.org/10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
assets		assets
conf		conf
docs/source		docs/source
modules/local		modules/local
workflows		workflows
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

quicksand

Description

Workflow

Quickstart

Requirements

Prepare Input

Download Test-file

Create Reference Database

Create Test Database

Download Full Database

Run quicksand

Output

Common Errors

Heap Space

Citation

Honorable Mentions

About

Uh oh!

Releases 5

Uh oh!

Languages

License

mpieva/quicksand

Folders and files

Latest commit

History

Repository files navigation

quicksand

Description

Workflow

Quickstart

Requirements

Prepare Input

Download Test-file

Create Reference Database

Create Test Database

Download Full Database

Run quicksand

Output

Common Errors

Heap Space

Citation

Honorable Mentions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Languages