2025-sourmash-eukaryotic-databases

Build infrastructure for creating sourmash databases from NCBI for all eukaryotic reference genomes.

The strategy is:

use the directsketch plugin to build high rez (scaled=1000, k=21/31/51) for all euks, and put them all in a directory outside of sourmash-db;
(this repo) make the various collection accession/taxid lists directly from NCBI;
(this repo) build the full set of euk collections, downsampled to scaled=10_000

How to install taxdump for taxonkit

You'll need to install the NCBI taxonomy taxdump for taxonkit, which is used by pytaxonkit.

In some stable directory,

wget -c ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
mkdir -p $HOME/.taxonkit
ln -s names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2025-sourmash-eukaryotic-databases

How to install taxdump for taxonkit

About

Releases

Packages

Languages

License

sourmash-bio/2025-sourmash-eukaryotic-databases

Folders and files

Latest commit

History

Repository files navigation

2025-sourmash-eukaryotic-databases

How to install taxdump for taxonkit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages