Pupmapper: A Pileup Mappability Calculator
The Pileup Mappability metric can be used to quickly identify regions which may be more difficult to perform variant calling with short-read WGS data. pupmapper
was created to allow users to quickly convert k-mer mappability scores to pileup mappability.
I would recommend running pupmapper
on k-mer mappability scores generated by the Genmap software.
NOTE: I just added a subcommand run_all
that will run Genmap on the input genome and then calculate pileup mappability scores from the genmap results.
**Pileup mappability is useful because it gives a sense of uniquemess of all possible reads (of defined length) that could align to a given position.**
Derrien, T, (2012). Fast Computation and Applications of Genome Mappability. PLOS ONE 7(1): e30377. https://doi.org/10.1371/journal.pone.0030377
Pockrandt C, (2020) GenMap: ultra-fast computation of genome mappability, Bioinformatics, Volume 36, Issue 12, June 2020, Pages 3687–3692, https://doi.org/10.1093/bioinformatics/btaa222
Lee H, Schatz MC. (2012). Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score, Bioinformatics, Volume 28, Issue 16, August 2012, Pages 2097–2105, https://doi.org/10.1093/bioinformatics/bts330
pupmapper
can be installed by cloning this repository and installing with pip
.
git clone [email protected]:maxgmarin/pupmapper.git
cd pgqc
pip install .
🚧 Check back soon 🚧
1) run_all
- Run all genmap pre-processing steps (indexing, k-mer mappability) and then calculate pileup mappability
pupmapper run_all -i Input.Genome.fasta -o output_directory/ -k 50 -e 1
The above command will first use genmap to calculate k-mer mappability scores for the input genome and then calculate pileup mappability scores.
pupmapper run_pileup -i kmap.K50E0.bedgraph -o pupmap.K50E0.bedgraph -k 50
The above command will calculate pileup mappability scores based on input k-mer mappabilities (k= 50 bp, E = 0 mismatches) that were generated using genmap
.
If you wish to run an pupmapper
on a small test sequence (15 bp), you can run the following commands:
cd tests/data/Genmap_Ex1/Ex1_gm_output
pupmapper run_pileup -i Ex1_Kmap_K4E0.bedgraph -o Ex1_Pmap_K4E0.bedgraph -k 4
The input file (Ex1_Kmap_K4E0.bedgraph
) was generated by running genmap
on the tests/data/Genmap_Ex1/Ex1.genome.fasta
with a k-mer size of 4 bp and a max mismatch of 0 (K=4,E=0).
$pupmapper run_pileup --help
usage: pupmapper run_pileup [-h] -i INPUT -o OUTPUT -k KMER_LEN
Command for calculating genome wide pileup mappability based on k-mer mappability values
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input k-mer mappability values in bedgraph format (.bedgraph). Ideally, generated with genmap software
-o OUTPUT, --output OUTPUT
Output table of pileup mappability (.bedgraph)
-k KMER_LEN, --kmer_len KMER_LEN
k-mer length (bp) used to generate the input k-mer mappability values
....
- 1.1) Use
genmap
(with your desired parameters) to calculate k-mer mappability for your genome of interest. (Output to .bedgraph) - 1.2) Use
pupmapper
to calculate pileup mappability from k-mer mappability (output to .bedgraph) - 1.3) Use
awk
andbedtools
to identify regions of the genome which have pileup mappability < 1 (or below your desired threshold).
To calculate pileup mappability with pupmapper
you must first generate k-mer mappability values with genmap. Refer to the getting started section of genmap
's README for more details.
You can use genmap
in two steps:
$ ./genmap index -F /path/to/fasta.fasta -I /path/to/index/folder
2) Calculate k-mer mappability with desired parameters (in this example k = 30 bp and E = up to 2 mismatches)
$ ./genmap map -K 30 -E 2 -I /path/to/index/folder -O /path/to/output/folder -t -w -bg