Command: k-mer abundance histogram #1846

taranglute · 2018-03-06T04:07:37Z

We have downloaded khmer 2.1.1 version. Now we want to benchmark for the full histogram for k-mer abundance,

Is the following command correct for obtaining 'the full k-mer abundance histogram" to be run on the dataset like human HS3 for the benchmark.

./abundance-dist-single.py -k 25 -T 12 input.fastq output_histo

standage · 2018-03-06T04:58:53Z

Your command looks mostly correct, although:

It's not typical to run commands from the scripts/ directory, so the ./ prefix may not make sense. Did you follow the latest installation instructions? If so, you should be able to execute the abundance-distance-single.py from any directory.
The command uses a constant amount of memory, and by default it is a very small amount of memory. For human whole-genome shotgun data, note that dozens of gigabytes of memory must be available if you want accurate k-mer abundances. For example, if you want to allocate 32 gigabyes computing the k-mer counts, you can add -M 32G to the command.

Provide feedback